6 research outputs found

    Ny forståelse av gasshydratfenomener og naturlige inhibitorer i råoljesystemer gjennom massespektrometri og maskinlæring

    Get PDF
    Gas hydrates represent one of the main flow assurance issues in the oil and gas industry as they can cause complete blockage of pipelines and process equipment, forcing shut downs. Previous studies have shown that some crude oils form hydrates that do not agglomerate or deposit, but remain as transportable dispersions. This is commonly believed to be due to naturally occurring components present in the crude oil, however, despite decades of research, their exact structures have not yet been determined. Some studies have suggested that these components are present in the acid fractions of the oils or are related to the asphaltene content of the oils. Crude oils are among the worlds most complex organic mixtures and can contain up to 100 000 different constituents, making them difficult to characterise using traditional mass spectrometers. The high mass accuracy of Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FT-ICR MS) yields a resolution greater than traditional techniques, making FT-ICR MS able to characterise crude oils to a greater extent, and possibly identify hydrate active components. FT-ICR MS spectra usually contain tens of thousands of peaks, and data treatment methods able to find underlying relationships in big data sets are required. Machine learning and multivariate statistics include many methods suitable for big data. A literature review identified a number of promising methods, and the current status for the use of machine learning for analysis of gas hydrates and FT-ICR MS data was analysed. The literature study revealed that although many studies have used machine learning to predict thermodynamic properties of gas hydrates, very little work have been done in analysing gas hydrate related samples measured by FT-ICR MS. In order to aid their identification, a successive accumulation procedure for increasing the concentrations of hydrate active components was developed by SINTEF. Comparison of the mass spectra from spiked and unspiked samples revealed some peaks that increased in intensity over the spiking levels. Several classification methods were used in combination with variable selection, and peaks related to hydrate formation were identified. The corresponding molecular formulas were determined, and the peaks were assumed to be related to asphaltenes, naphthenes and polyethylene glycol. To aid the characterisation of the oils, infrared spectroscopy (both Fourier Transform infrared and near infrared) was combined with FT-ICR MS in a multiblock analysis to predict the density of crude oils. Two different strategies for data fusion were attempted, and sequential fusion of the blocks achieved the highest prediction accuracy both before and after reducing the dimensions of the data sets by variable selection. As crude oils have such complex matrixes, samples are often very different, and many methods are not able to handle high degrees of variations or non-linearities between the samples. Hierarchical cluster-based partial least squares regression (HC-PLSR) clusters the data and builds local models within each cluster. HC-PLSR can thus handle non-linearities between clusters, but as PLSR is a linear model the data is still required to be locally linear. HC-PLSR was therefore expanded into deep learning (HC-CNN and HC-RNN) and SVR (HC-SVR). The deep learning-based models outperformed HC-PLSR for a data set predicting average molecular weights from hydrolysed raw materials. The analysis of the FT-ICR MS spectra revealed that the large amounts of information contained in the data (due to the high resolution) can disturb the predictive models, but the use of variable selection counteracts this effect. Several methods from machine learning and multivariate statistics were proven valuable for prediction of various parameters from FT-ICR MS using both classification and regression methods.Gasshydrater er et av hovedproblemene for Flow assurance i olje- og gassnæringen ettersom at de kan forårsake blokkeringer i oljerørledninger og prosessutstyr som krever at systemet må stenges ned. Tidligere studier har vist at noen råoljer danner hydrater som ikke agglomererer eller avsetter, men som forblir som transporterbare dispersjoner. Dette antas å være på grunn av naturlig forekommende komponenter til stede i råoljen, men til tross for årevis med forskning er deres nøyaktige strukturer enda ikke bestemt i detalj. Noen studier har indikert at disse komponentene kan stamme fra syrefraksjonene i oljen eller være relatert til asfalteninnholdet i oljene. Råoljer er blant verdens mest komplekse organiske blandinger og kan inneholde opptil 100 000 forskjellige bestanddeler, som gjør dem vanskelig å karakterisere ved bruk av tradisjonelle massespektrometre. Den høye masseoppløsningen Fourier-transform ion syklotron resonans massespektrometri (FT-ICR MS) gir en høyere oppløsning enn tradisjonelle teknikker, som gjør FT-ICR MS i stand til å karakterisere råoljer i større grad og muligens identifisere hydrataktive komponenter. FT-ICR MS spektre inneholder vanligvis titusenvis av topper, og det er nødvendig å bruke databehandlingsmetoder i stand til å håndtere store datasett, med muligheter til å finne underliggende forhold for å analysere spektrene. Maskinlæring og multivariat statistikk har mange metoder som er passende for store datasett. En litteratur studie identifiserte flere metoder og den nåværende statusen for bruken av maskinlæring for analyse av gasshydrater og FT-ICR MS data. Litteraturstudien viste at selv om mange studier har brukt maskinlæring til å predikere termodynamiske egenskaper for gasshydrater, har lite arbeid blitt gjort med å analysere gasshydrat relaterte prøver målt med FT-ICR MS. For å bistå identifikasjonen ble en suksessiv akkumuleringsprosedyre for å øke konsentrasjonene av hydrataktive komponenter utviklet av SINTEF. Sammenligninger av massespektrene fra spikede og uspikede prøver viste at noen topper økte sammen med spikingnivåene. Flere klassifikasjonsmetoder ble brukt i kombinasjon med ariabelseleksjon for å identifisere topper relatert til hydratformasjon. Molekylformler ble bestemt og toppene ble antatt å være relatert til asfaltener, naftener og polyetylenglykol. For å bistå karakteriseringen av oljene ble infrarød spektroskopi inkludert med FT-ICR MS i en multiblokk analyse for å predikere tettheten til råoljene. To forskjellige strategier for datafusjonering ble testet og sekvensiell fusjonering av blokkene oppnådde den høyeste prediksjonsnøyaktigheten både før og etter reduksjon av datasettene med bruk av variabelseleksjon. Ettersom råoljer har så kompleks sammensetning, er prøvene ofte veldig forskjellige og mange metoder er ikke egnet for å håndtere store variasjoner eller ikke-lineariteter mellom prøvene. Hierarchical cluster-based partial least squares regression (HCPLSR) grupperer dataene og lager lokale modeller for hver gruppe. HC-PLSR kan dermed håndtere ikke-lineariteter mellom gruppene, men siden PLSR er en lokal modell må dataene fortsatt være lokalt lineære. HC-PLSR ble derfor utvidet til convolutional neural networks (HC-CNN) og recurrent neural networks (HC-RNN) og support vector regression (HC-SVR). Disse dyp læring metodene utkonkurrerte HC-PLSR for et datasett som predikerte gjennomsnittlig molekylvekt fra hydrolyserte råmaterialer. Analysen av FT-ICR MS spektre viste at spektrene inneholder veldig mye informasjon. Disse store mengdene med data kan forstyrre prediksjonsmodeller, men bruken av variabelseleksjon motvirket denne effekten. Flere metoder fra maskinlæring og multivariat statistikk har blitt vist å være nyttige for prediksjon av flere parametere from FT-ICR MS data ved bruk av både klassifisering og regresjon

    Current overview and way forward for the use of machine learning in the field of petroleum gas hydrates

    Get PDF
    Gas hydrates represent one of the main flow assurance challenges in the oil and gas industry as they can lead to plugging of pipelines and process equipment. In this paper we present a literature study performed to evaluate the current state of the use of machine learning methods within the field of gas hydrates with specific focus on the oil chemistry. A common analysis technique for crude oils is Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FT-ICR MS) which could be a good approach to achieving a better understanding of the chemical composition of hydrates, and the use of machine learning in the field of FT-ICR MS was therefore also examined. Several machine learning methods were identified as promising, their use in the literature was reviewed and a text analysis study was performed to identify the main topics within the publications. The literature search revealed that the publications on the combination of FT-ICR MS, machine learning and gas hydrates is limited to one. Most of the work on gas hydrates is related to thermodynamics, while FT-ICR MS is mostly used for chemical analysis of oils. However, with the combination of FT-ICR MS and machine learning to evaluate samples related to gas hydrates, it could be possible to improve the understanding of the composition of hydrates and thereby identify hydrate active compounds responsible for the differences between oils forming plugging hydrates and oils forming transportable hydrates.Current overview and way forward for the use of machine learning in the field of petroleum gas hydratespublishedVersio

    Using machine learning-based variable selection to identify hydrate related components from FT-ICR MS spectra

    Get PDF
    The blockages of pipelines caused by agglomeration of gas hydrates is a major flow assurance issue in the oil and gas industry. Some crude oils form gas hydrates that remain as transportable particles in a slurry. It is commonly believed that naturally occurring components in those crude oils alter the surface properties of gas hydrate particles when formed. The exact structure of the crude oil components responsible for this surface modification remains unknown. In this study, a successive accumulation and spiking of hydrate-active crude oil fractions was performed to increase the concentration of hydrate related compounds. Fourier Transform Ion Cyclotron Resonance Mass Spectrometry (FT-ICR MS) was then utilised to analyse extracted oil samples for each spiking generation. Machine learning-based variable selection was used on the FT-ICR MS spectra to identify the components related to hydrate formation. Among six different methods, Partial Least Squares Discriminant Analysis (PLS-DA) was selected as the best performing model and the 23 most important variables were determined. The FT-ICR MS mass spectra for each spiking level was compared to samples extracted before the successive accumulation, to identify changes in the composition. Principal Component Analysis (PCA) exhibited differences between the oils and spiking levels, indicating an accumulation of hydrate active components. Molecular formulas, double bond equivalents (DBE) and hydrogen-carbon (H/C) ratios were determined for each of the selected variables and evaluated. Some variables were identified as possibly asphaltenes and naphthenic acids which could be related to the positive wetting index (WI) for the oils.publishedVersio

    Prediction of toxicity in shellfish based on their fatty acid composition

    Get PDF
    Lipofile marine biotoksiner kan akkumuleres i skjell og være en helserisiko for mennes- ker hvis de konsumeres. Okadasyregruppen er toksingruppen som er den mest vanlige årsaken til diaréfremkallende skjellforgiftning (DSP) i Norge. Toksininnholdet kontrol- leres i kommersielt omsatte skjell. Metoden for måling av toksininnhold har flere ulem- per (Fux et al. 2008, Aanrud 2016). Et alternativ til analysen er å bruke multivariabel statistikk til å finne ut om skjellene er giftige. Analyse av fettsyreprofiler i en skjellprøve kan være enklere og sikrere enn analyse av toksininnholdet. Statistiske metoder som PCR, PLS, CPLS og variabelseleksjon ble un- dersøkt for å finne en prediksjonsmodell for toksisitet basert på fettsyresammenset- ning. Metodene ble validert ved leave-one-out kryssvalidering og testsettvalidering. PCA ble kjørt for å se på grupperinger eller sammenhenger i variablene. Sammenlig- ning av scoreplot og ladningsplot antydet at Blåskjell inneholder mer trans-fettsyrer og Stillehavsøsters inneholder med mettede fettsyrer. ANOVA ble gjennomført for å vurdere forklaringsvariablene. Sted kom ut som signifi- kant med Æ = 0.05. Det ble under denne analysen oppdaget en uteligger B-1443 Rund- haugen. Denne prøven representerte en ekstrem algeoppblomstring som kan føre til overestimering dersom den ble inkludert i modellen. Siden sted virket å ha effekt ble residualer hentet ut fra ANOVA og brukt som respons for noen regresjonsmetoder. De tre beste metodene ble valgt ut til å være PCR med logtransformering, PLS med logtransformering og CPLS med logtransformering. CVANOVA og Tukey post hoc-test viste at CPLS med 5 komponenter var den beste metoden.Lipophilic marine biotoxins can be accumulated in different shellfish and can be a health risk to humans if consumed. Okadaic acid is the toxin group which most com- monly causes Diarrhetic Shellfish Poisoning (DSP) in Norway. Commercially distri- buted shellfish are controlled for toxins. The analysis for detection of toxins have some disadvantages (Fux et al. 2008, Aanrud 2016). One option instead of this analysis is the use of multivariate statistics to discover toxic shellfish. Analysis of fatty acid profiles in shellfish can be easier and more accurate then measu- ring toxin content. Statistical methods such as PCR, PLS, CPLS and forward selection was explored to obtain a prediction model for the toxicity based on the fatty acid com- position. The methods were validated using leave-one-out crossvalidation and testset validation. PCA was examined to explore groupings or realtions in the data. Compa- risons between scoreplots and loadingplots indicated that Blue mussle contains more trans fatty acids and Pacific oyster contains more saturated fatty acids. ANOVA was performed to evaluate the explanatory variables. Samplearea was deter- mined to be significant (Æ = 0.05). During this analysis an outlier, B-1443 Rundhau- gen, was detected. This sample represents an extreme algae bloom which could lead to overestimation if included. Samplearea seemed to have an effect on toxicity and resi- duals from the ANOVA were assesed as a respons during the regression methods. The three best methods were selected as PCR with logtransformation, PLS with log- transformation and CPLS with logtransformation. CVANOVA and Tukey post hoc-test suggested that CPLS containing 5 components was the best method.M-KJEM

    Increased interpretation of deep learning models using hierarchical cluster-based modelling.

    No full text
    Linear prediction models based on data with large inhomogeneity or abrupt non-linearities often perform poorly because relationships between groups in the data dominate the model. Given that the data is locally linear, this can be overcome by splitting the data into smaller clusters and creating a local model within each cluster. In this study, the previously published Hierarchical Cluster-based Partial Least Squares Regression (HC-PLSR) procedure was extended to deep learning, in order to increase the interpretability of the deep learning models through local modelling. Hierarchical Cluster-based Convolutional Neural Networks (HC-CNNs), Hierarchical Cluster-based Recurrent Neural Networks (HC-RNNs) and Hierarchical Cluster-based Support Vector Regression models (HC-SVRs) were implemented and tested on spectroscopic data consisting of Fourier Transform Infrared (FT-IR) measurements of raw material dry films, for prediction of average molecular weight during hydrolysis and a simulated data set constructed to contain three clusters of observations with different non-linear relationships between the independent variables and the response. HC-CNN, HC-RNN and HC-SVR outperformed HC-PLSR for the simulated data set, showing the disadvantage of PLSR for highly non-linear data, but for the FT-IR data set there was little to gain in prediction ability from using more complex models than HC-PLSR. Local modelling can ease the interpretation of deep learning models through highlighting differences in feature importance between different regions of the input or output space. Our results showed clear differences between the feature importance for the various local models, which demonstrate the advantages of a local modelling approach with regards to interpretation of deep learning models
    corecore